AITopics

arXiv.org Artificial IntelligenceNov-26-2025

Active learning promises to provide an optimal training sample selection procedure in the construction of machine learning models. It often relies on minimizing the model's variance, which is assumed to decrease the prediction error. Still, it is frequently even less efficient than pure random sampling. Motivated by the bias-variance decomposition, we propose to minimize the model's bias instead of its variance. By doing so, we are able to almost exactly match the best-case error over all possible greedy sample selection procedures for a relevant application. Our bias approximation is based on using cheap to calculate low fidelity data as known from $Δ$-ML or multifidelity machine learning. We exemplify our approach for a wider class of applications in quantum chemistry including predicting excitation energies and ab initio potential energy surfaces. Here, the proposed method reduces training data consumption by up to an order of magnitude compared to standard active learning.

artificial intelligence, machine learning, variance, (16 more...)

2508.15577

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation Energies

arXiv.org Artificial IntelligenceOct-15-2024

Recent progress in machine learning (ML) has made high-accuracy quantum chemistry (QC) calculations more accessible. Of particular interest are multifidelity machine learning (MFML) methods where training data from differing accuracies or fidelities are used. These methods usually employ a fixed scaling factor, $\gamma$, to relate the number of training samples across different fidelities, which reflects the cost and assumed sparsity of the data. This study investigates the impact of modifying $\gamma$ on model efficiency and accuracy for the prediction of vertical excitation energies using the QeMFi benchmark dataset. Further, this work introduces QC compute time informed scaling factors, denoted as $\theta$, that vary based on QC compute times at different fidelities. A novel error metric, error contours of MFML, is proposed to provide a comprehensive view of model error contributions from each fidelity. The results indicate that high model accuracy can be achieved with just 2 training samples at the target fidelity when a larger number of samples from lower fidelities are used. This is further illustrated through a novel concept, the $\Gamma$-curve, which compares model error against the time-cost of generating training samples, demonstrating that multifidelity models can achieve high accuracy while minimizing training data costs.

artificial intelligence, fidelity, machine learning, (16 more...)

2410.11392

Country: Europe > Germany (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Assessing Non-Nested Configurations of Multifidelity Machine Learning for Quantum-Chemical Properties

arXiv.org Artificial IntelligenceJul-24-2024

Multifidelity machine learning (MFML) for quantum chemical (QC) properties has seen strong development in the recent years. The method has been shown to reduce the cost of generating training data for high-accuracy low-cost ML models. In such a set-up, the ML models are trained on molecular geometries and some property of interest computed at various computational chemistry accuracies, or fidelities. These are then combined in training the MFML models. In some multifidelity models, the training data is required to be nested, that is the same molecular geometries are included to calculate the property across all the fidelities. In these multifidelity models, the requirement of a nested configuration restricts the kind of sampling that can be performed while selection training samples at different fidelities. This work assesses the use of non-nested training data for two of these multifidelity methods, namely MFML and optimized MFML (o-MFML). The assessment is carried out for the prediction of ground state energies and first vertical excitation energies of a diverse collection of molecules of the CheMFi dataset. Results indicate that the MFML method still requires a nested structure of training data across the fidelities. However, the o-MFML method shows promising results for non-nested multifidelity training data with model errors comparable to the nested configurations.

configuration, fidelity, training data, (15 more...)

2407.17087

Country: Europe > Germany (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules

arXiv.org Artificial IntelligenceJun-20-2024

Progress in both Machine Learning (ML) and conventional Quantum Chemistry (QC) computational methods have resulted in high accuracy ML models for QC properties ranging from atomization energies to excitation energies. Various datasets such as MD17, MD22, and WS22, which consist of properties calculated at some level of QC method, or fidelity, have been generated to benchmark such ML models. The term fidelity refers to the accuracy of the chosen QC method to the actual real value of the property. The higher the fidelity, the more accurate the calculated property, albeit at a higher computational cost. Research in multifidelity ML (MFML) methods, where ML models are trained on data from more than one numerical QC method, has shown the effectiveness of such models over single fidelity methods. Much research is progressing in this direction for diverse applications ranging from energy band gaps to excitation energies. A major hurdle for effective research in this field of research in the community is the lack of a diverse multifidelity dataset for benchmarking. Here, we present a comprehensive multifidelity dataset drawn from the WS22 molecular conformations. We provide the quantum Chemistry MultiFidelity (CheMFi) dataset consisting of five fidelities calculated with the TD-DFT formalism. The fidelities differ in their basis set choice and are namely: STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP. CheMFi offers to the community a variety of QC properties including vertical excitation energies, oscillator strengths, molecular dipole moments, and ground state energies. In addition to the dataset, multifidelity benchmarks are set with state-of-the-art MFML and optimized-MFML

dataset, fidelity, molecule, (14 more...)

2406.14149

Country: Europe > Germany (0.04)

Genre: Research Report (0.50)

Industry: Materials > Chemicals (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJan-18-2023

Electronic excited states in deep variational Monte Carlo

Entwistle, Mike, Schätzle, Zeno, Erdman, Paolo A., Hermann, Jan, Noé, Frank

Obtaining accurate ground and low-lying excited states of electronic systems is crucial in a multitude of important applications. One ab initio method for solving the Schr\"odinger equation that scales favorably for large systems is variational quantum Monte Carlo (QMC). The recently introduced deep QMC approach uses ansatzes represented by deep neural networks and generates nearly exact ground-state solutions for molecules containing up to a few dozen electrons, with the potential to scale to much larger systems where other highly accurate methods are not feasible. In this paper, we extend one such ansatz (PauliNet) to compute electronic excited states. We demonstrate our method on various small atoms and molecules and consistently achieve high accuracy for low-lying states. To highlight the method's potential, we compute the first excited state of the much larger benzene molecule, as well as the conical intersection of ethylene, with PauliNet matching results of more expensive high-level methods.

artificial intelligence, excitation energy, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1038/s41467-022-35534-5

2203.09472

Country:

Europe > Germany > Berlin (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningMay-28-2017

Direct Mapping Hidden Excited State Interaction Patterns from ab initio Dynamics and Its Implications on Force Field Development

Liu, Fang, Du, Likai, Zhang, Dongju, Gao, Jun

The excited states of polyatomic systems are rather complex, and often exhibit meta-stable dynamical behaviors. Static analysis of reaction pathway often fails to sufficiently characterize excited state motions due to their highly non-equilibrium nature. Here, we proposed a time series guided clustering algorithm to generate most relevant meta-stable patterns directly from ab initio dynamic trajectories. Based on the knowledge of these meta-stable patterns, we suggested an interpolation scheme with only a concrete and finite set of known patterns to accurately predict the ground and excited state properties of the entire dynamics trajectories. As illustrated with the example of sinapic acids, the estimation error for both ground and excited state is very close, which indicates one could predict the ground and excited state molecular properties with similar accuracy. These results may provide us some insights to construct an excited state force field with compatible energy terms as traditional ones.

artificial intelligence, stable pattern, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

doi: 10.1038/s41598-017-09347-2

1705.09919

Country:

North America > United States (0.28)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Energy > Oil & Gas > Upstream (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)